Wine tasting can range from a casual pastime to a lucrative profession. For professional sommeliers, considerable time and training is required to adequately rate wine quality. Intuitively, we expect expert ratings to reflect the underlying chemical composition of the wines. To this end, we hypothesized it would be possible to predict ratings using statistical models based on the chemical makeup of wines.
To determine how accurately expert wine quality ratings can be predicted using a set of easily measured chemical components.
To account for the ordinal nature of wine ratings without assuming continuity, we implemented three cumulative logit models: proportional odds, partial proportional odds, and non-proportional odds. Letting \(Z\) correspond to level of wine rating , \(i\) correspond to a specific wine, and \(x_i\) correspond to the chemical components of wine i, the models are given below:
Where \(\alpha_j > \alpha_i\) for \(j > i\), and \(x^T_{i*}\) and \(x^T_{i**}\) are submatrices of \(x_i\).
The log likelihood for each model was maximized using the BFGS algorithm as implemented in the R package optimx (alternatively, users of our R package could specify any available method implemented in optimx). For models that converged, the predicted classification was taken to be the rating with the highest predicted probability. An S3 predict method was implemented in sommelieR to facilitate these predictions.
We compared models on the following metrics:
For models fit on the white wine training data, only proportional odds models converged to parameter estimates that produced non-negative fitted probabilities for each subject and quality rating. The code for fitting these models is given below (note that the models objects are saved to shorten the compile time for this report):
#get starting values for model beta's
beta.starts <- coef(lm(quality ~ alcohol+ pH + volatile.acidity + residual.sugar,
data = white_train))
#proportional odds, reduced model
white.prop.odds <- partial.prop.odds.mod(y ="quality", in.data = white_train,
prop.odds.formula = ~ alcohol+ pH + volatile.acidity + residual.sugar,
beta.prop.odds.start = beta.starts[c(2:5)],
method = "BFGS")
saveRDS(white.prop.odds, "proportional_odds_models/white.prop.odds.reduced.rds")
#proportional odds, full model
beta.starts <- coef(lm(quality ~ alcohol+ pH + volatile.acidity + residual.sugar + fixed.acidity + citric.acid +
chlorides + free.sulfur.dioxide + total.sulfur.dioxide + density + sulphates,
data = white_train))
white.prop.odds <- partial.prop.odds.mod(y ="quality", in.data = white_train,
prop.odds.formula = ~ alcohol+ pH + volatile.acidity + residual.sugar +
fixed.acidity + citric.acid + chlorides + free.sulfur.dioxide +
total.sulfur.dioxide + density + sulphates,
beta.prop.odds.start = beta.starts[c(2:12)],
method = "BFGS")
saveRDS(white.prop.odds, "proportional_odds_models/white.prop.odds.rds")
Then using the predict method implemented for this class, we predicted the ratings in the testing data as follows:
#load saved models
white.prop.odds.reduced <- readRDS("proportional_odds_models/white.prop.odds.reduced.rds")
white.prop.odds <- readRDS("proportional_odds_models/white.prop.odds.rds")
#predict values
white.preds <- predict(white.prop.odds, white_test)$most.likely
white.preds.reduced <- predict(white.prop.odds.reduced, white_test)$most.likely
The performance for the full and reduced models are given below. Notably, the full model performed better than the reduced, but only very slightly.
| Prediction Accuracy | Kappa | Weighted Kappa | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Proportional Odds (F) | 52.1472 | 0.2126 | 0.4053 | 0 | 0 | 45.3608 | 78.1321 | 19.8864 | 0 | 0 |
| Proportional Odds (R) | 51.7382 | 0.2108 | 0.3993 | 0 | 0 | 51.2027 | 74.9431 | 15.9091 | 0 | 0 |
Likewise, the confusion matrix for the full model, using the plotting functionality implemented in our package, is given below:
#plotting confusion matrix
cmPlot(white.pred.table, "white", pred_first = T, title = "Proportional Odds Model Predicted vs Observed", axis.title.size = 20, axis.text.size = 25)
For models fit to the red training data, both proportional and partial proportional models converged. The results presented for the partial proportional model allow the coefficient for total sulfur dioxide to vary with the level of wine quality. The code for producing these models is given below:
#reduced red wine models
beta.starts <- coef(lm(quality ~ alcohol+ pH + volatile.acidity + sulphates + total.sulfur.dioxide,
data = red_train))
red.partial.prop.reduced <- partial.prop.odds.mod(y ="quality", in.data = red_train,
prop.odds.formula = ~ alcohol + pH+ volatile.acidity + sulphates,
beta.prop.odds.start = beta.starts[2:5],
non.prop.odds.formula = ~total.sulfur.dioxide,
beta.non.prop.odds.start = matrix(rep(beta.starts[6], 5), nrow = 1),
method = "BFGS")
red.prop.odds.reduced <- partial.prop.odds.mod(y ="quality", in.data = red_train,
prop.odds.formula = ~ alcohol + pH + volatile.acidity + sulphates
+ total.sulfur.dioxide,
beta.prop.odds.start = beta.starts[2:6],
method = "BFGS")
saveRDS(red.partial.prop.reduced, "proportional_odds_models/red.partial.prop.reduced.rds")
saveRDS(red.prop.odds.reduced, "proportional_odds_models/red.prop.odds.reduced.rds")
#full models
beta.starts <- coef(lm(quality ~ alcohol+ pH + volatile.acidity + residual.sugar + fixed.acidity + citric.acid +
chlorides + free.sulfur.dioxide + total.sulfur.dioxide + density + sulphates,
data = red_train))
#partial proportional odds model
red.partial.prop <- partial.prop.odds.mod(y ="quality", in.data = red_train,
prop.odds.formula = ~ alcohol+ pH + volatile.acidity + residual.sugar +
fixed.acidity + citric.acid + chlorides + free.sulfur.dioxide +
density + sulphates,
beta.prop.odds.start = beta.starts[c(2:9, 11:12)],
non.prop.odds.formula = ~total.sulfur.dioxide,
beta.non.prop.odds.start = matrix(rep(beta.starts[10], 5), nrow = 1),
method = "BFGS")
#proportional odds model
red.prop.odds <- partial.prop.odds.mod(y ="quality", in.data = red_train,
prop.odds.formula = ~ alcohol+ pH + volatile.acidity + residual.sugar +
fixed.acidity + citric.acid + chlorides + free.sulfur.dioxide +
total.sulfur.dioxide + density + sulphates,
beta.prop.odds.start = beta.starts[2:12],
method = "BFGS")
saveRDS(red.partial.prop, "proportional_odds_models/red.partial.prop.rds")
saveRDS(red.prop.odds, "proportional_odds_models/red.prop.odds.rds")
Predictions on the testing data were again made using our S3 predict method (code as shown for white wines). The performance of the trained models on the testing data are given below. Briefly, the proportional and partial proportional odds models performed similarly, and the full models performed very slightly better than the reduced models.
| Prediction Accuracy | Kappa | Weighted Kappa | 3 | 4 | 5 | 6 | 7 | 8 | |
|---|---|---|---|---|---|---|---|---|---|
| Proportional Odds (F) | 58.9905 | 0.3232 | 0.5258 | 0 | 0 | 72.0588 | 59.8425 | 33.3333 | 0 |
| Proportional Odds (R) | 58.0442 | 0.3065 | 0.4707 | 0 | 0 | 72.0588 | 59.8425 | 25.6410 | 0 |
| Partial Proportional Odds (F) | 58.9905 | 0.3233 | 0.5162 | 0 | 0 | 72.0588 | 60.6299 | 30.7692 | 0 |
| Partial Proportional Odds (R) | 58.3596 | 0.3117 | 0.4742 | 0 | 0 | 72.0588 | 59.8425 | 28.2051 | 0 |
As with the white wines, the confusion matrices for the full proportional and partial proportional red models are shown below:
| Model | Accuracy | Kappa | Weighted Kappa | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| Full Model (Linear Terms) | 58.3596 | 0.3158 | 0.5227 | 0 | 20 | 74.2647 | 55.9055 | 28.2051 | 0 |
| Reduced Model (Linear Terms) | 58.3596 | 0.3193 | 0.4952 | 0 | 10 | 72.7941 | 56.6929 | 33.3333 | 0 |
| Reduced Model (Second Order Terms) | 55.2050 | 0.2702 | 0.4789 | 0 | 0 | 67.6471 | 55.1181 | 33.3333 | 0 |
| Model | Accuracy | Kappa | Weighted Kappa | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|---|
| Full Model (Linear Terms) | 54.0900 | 0.2451 | 0.4121 | 0 | 9.375 | 51.2027 | 79.4989 | 15.9091 | 0 | 0 |
| Reduced Model (Linear Terms) | 51.9427 | 0.2201 | 0.4093 | 0 | 3.125 | 54.6392 | 72.6651 | 16.4773 | 0 | 0 |
| Reduced Model (Second Order Terms) | 53.2720 | 0.2396 | 0.4010 | 0 | 3.125 | 54.2955 | 74.4875 | 19.8864 | 0 | 0 |
| Prediction Accuracy | Kappa | Weighted Kappa | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Red Wine | 70.98 | 0.5263 | 0.6168 | 0 | 0 | 83.82 | 70.87 | 53.85 | 0.00 | NA |
| White Wine | 67.28 | 0.4862 | 0.6542 | 0 | 25 | 67.35 | 80.87 | 47.73 | 42.86 | 0 |
Comparisons of the performance of the different modeling approaches are given below:
| Model | Prediction Accuracy | Kappa | Weighted Kappa | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| Random Forest | 70.9800 | 0.5263 | 0.6168 | 0 | 0 | 83.8200 | 70.8700 | 53.8500 | 0 |
| Proportional Odds | 58.9905 | 0.3232 | 0.5258 | 0 | 0 | 72.0588 | 59.8425 | 33.3333 | 0 |
| Multinomial | 58.3596 | 0.3158 | 0.5227 | 0 | 20 | 74.2647 | 55.9055 | 28.2051 | 0 |
| Partial Proportional Odds | 58.9905 | 0.3233 | 0.5162 | 0 | 0 | 72.0588 | 60.6299 | 30.7692 | 0 |
| Linear Regression | 57.7300 | 0.2998 | 0.4996 | 0 | 0 | 67.6500 | 64.5700 | 23.0800 | 0 |
| Model | Prediction Accuracy | Kappa | Weighted Kappa | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|---|
| Random Forest | 67.2800 | 0.4862 | 0.6542 | 0 | 25.000 | 67.3500 | 80.8700 | 47.7300 | 42.86 | 0 |
| Linear Regression | 52.6100 | 0.2162 | 0.4211 | 0 | 0.000 | 39.8600 | 81.7800 | 22.1600 | 0.00 | 0 |
| Multinomial | 54.0900 | 0.2451 | 0.4121 | 0 | 9.375 | 51.2027 | 79.4989 | 15.9091 | 0.00 | 0 |
| Proportional Odds | 52.1472 | 0.2126 | 0.4053 | 0 | 0.000 | 45.3608 | 78.1321 | 19.8864 | 0.00 | 0 |
Using random forests, we predicted expert wine quality ratings fairly accurately. For white wines, the overall accuracy was 67.28%, with a weighted kappa of 0.6542. For red wines, the accuracy was 70.98% with weighted kappa 0.6168. Using the proposed cutoffs from Landis and Koch, these weighted kappa values suggest moderate to substantial agreement with the expert ratings.
The likelihood approaches performed more poorly. Accuracies were 10-15% lower than random forest and weighted kappas were 0.1 to 0.2 lower. Moreover, among the likelihood based models, predictive performance was similar and no individual approach stood out as superior. Generally, the models showed a tendency to predict ratings towards the middle of the distribution, and, interestingly, accounting for the ordered nature of the ratings did not make a substantial difference in performance. Indeed, the multinomial was more accurate than the proportional odds and linear regression models for very low and high quality wines. Overall, the weighted kappas for the likelihood based models suggested moderate agreement with expert ratings.
Our project has limitations. We only had data on a small number of chemical components, so we speculate some misclassification was due to failure to include other important chemicals in our models. Likewise, our datasets included only small numbers of extremely poor and very excellent wines making these difficult to predict. In the future, it would be worthwhile to examine whether various outlier detection algorithms are more suited to characterizing these wines than the approaches we took.
In terms of variable selection for the likelihood based models, we saw only a minor improvement in predictive performance for the full compared to reduced models suggesting we may have overfit the data. A regularization method like LASSO or elastic net with cross-validation may have been useful. Additionally, our modeling focus was purely on prediction, not on characterizing how individual chemicals relate to wine quality. An interesting extension of this project would more thoroughly investigate those relationships.
In summary, we found wine quality ratings can be predicted reasonably well based on their chemical components. Random forests performed better than likelihood based models, but all methods demonstrated at least moderate agreement with expert ratings. Nevertheless, the predictions were far from perfect, so true wine connoisseurs are still better off consulting a sommelier.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-74.